Efficient Tree Mining Using Reverse Search
نویسندگان
چکیده
In this paper, we review our data mining algorithms for discovering frequent substructures in a large collection of semi-structured data, where both of the patterns and the data are modeled by labeled trees. These algorithms, namely FREQT for mining frequent ordered trees and UNOT for mining frequent unordered trees, efficiently enumerate all frequent tree patterns without duplicates using reverse search, which is a general scheme for designing efficient algorithms for hard enumeration problems, and incrementally compute of the occurrences of a pattern. We also discuss classes of trees to which reverse search is applicable, such as itemsets, sequential episodes, path trees, and graphs. Correspondence: Tatsuya Asai Department of Informatics, Kyushu University, 6-10-1 Hakozaki Higashi-ku, Fukuoka 812-8581, JAPAN E-mail: [email protected] phone: +81-92-642-2697, fax: +81-92-642-2698
منابع مشابه
GTRACE-RS: Efficient Graph Sequence Mining using Reverse Search
The mining of frequent subgraphs from labeled graph data has been studied extensively. Furthermore, much attention has recently been paid to frequent pattern mining from graph sequences. A method, called GTRACE, has been proposed to mine frequent patterns from graph sequences under the assumption that changes in graphs are gradual. Although GTRACE mines the frequent patterns efficiently, it sti...
متن کاملCHARM: An Efficient Algorithm for Closed Itemset Mining
The set of frequent closed itemsets uniquely determines the exact frequency of all itemsets, yet it can be orders of magnitude smaller than the set of all frequent itemsets. In this paper we present CHARM, an efficient algorithm for mining all frequent closed itemsets. It enumerates closed sets using a dual itemset-tidset search tree, using an efficient hybrid search that skips many levels. It ...
متن کاملEfficient Mining of High Utility Sequential Patterns Over Data Streams
High utility sequential pattern mining has emerged as an important topic in data mining. Although several preliminary works have been conducted on this topic, the existing studies mainly focus on mining high utility sequential patterns (HUSPs) in static databases and do not consider the streaming data. Mining HUSPs over data streams is very desirable for many applications. However, addressing t...
متن کاملBOSTER: An Efficient Algorithm for Mining Frequent Unordered Induced Subtrees
Extracting frequent subtrees from the tree structured data has important applications in Web mining. In this paper, we introduce a novel canonical form for rooted labelled unordered trees called the balanced-optimal-search canonical form (BOCF) that can handle the isomorphism problem efficiently. Using BOCF, we define a tree structure guided scheme based enumeration approach that systematically...
متن کاملSurvey of Efficient and Fast Nearest Neighbor Search For Spatial Query on Multidimensional Data
Spatial data mining is a special kind of data mining. Patterns, clusters, classifications, etc., can be derived from the big data available. Especially, nearest neighbor search approach with respect to a query point plays a key role in arriving at the final decision making. Like Computer Integrated Manufacturing, Facility Layout, Cellular Manufacturing, nearest neighbor search has been found se...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003